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DEVICE AND PROCESS FOR USE IN ENCODING AUDIO DATA 
FIELD OF THE INVENTION 

The present invention relates to a device and process for use in encoding audio data, and in 
5 particular to a psychoacoustic mask generation process for MPEG audio encoding. 

BACKGROUND 

The MPEG-1 audio standard, as described in the International Standards Organisation 
(ISO) document ISO/IEC 11172-3: Information technology - Coding of moving pictures 
10 and associated audio for digital storage media at up to about 1.5Mbps (“the MPEG-1 
standard”), defines processes for lossy compression of digital audio and video data. The 
MPEG-1 standard defines three alternative processes or “layers” for audio compression, 
providing progressively higher degrees of compression at the expense of increasing 
complexity. The second layer, referred to as MPEG-1-L2, provides an audio compression 
1 5 format widely used in consumer multimedia applications. As these applications progress 
from providing playback only to also providing recording, a need arises for consumer- 
grade and consumer-priced devices that can generate MPEG-1-L2 compliant audio data. 

The reference implementation for an MPEG-1-L2 encoder described in the MPEG-1 
20 standard is not suitable for real-time consumer applications, and requires considerable 
resources in terms of both memory and processing power. In particular, the 
psychoacoustic masking process used in the MPEG-1-L2 audio encoder referred to uses a 
number of successive and processing intensive power and energy data conversions that 
also incur a repeated loss in precision. 

25 

Accordingly, it is desired to address the above or at least provide a useful alternative. 
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SUMMARY OF THE INVENTION 

In accordance with the present invention there is provided a mask generation process for 
use in encoding audio data, including: 

generating linear masking components from said audio data; 

5 generating logarithmic masking components from said linear masking components; and 
generating a global masking threshold from the logarithmic masking components. 

The present invention also provides a mask generation process for use in encoding audio 
data, including: 

1 0 generating respective masking thresholds from logarithmic masking components 

using a masking function of the form: 
vf = -\7* dz, 0<,dz<% 

The present invention also provides a mask generation process for use in encoding audio 
1 S data, including: 

generating a global masking threshold from logarithmic masking components 
according to: 

LT g (/) = max[zr 9 (i)+ {LT tonal [z{j\ z(/)]}+ max” =1 {LT noise [z(j\ z(/)]}] 

where / and j are indices of spectral audio data, z(i) is a Bark scale value for spectral 
20 line i, LT lanat \z(j\ z(/)] is a tonal masking threshold for lines i and j, LT n0lse \z(j\ z(z')] is a 

non-tonal masking threshold for lines / and j, m is the number of tonal spectral lines, and n 
is the number of non-tonal spectral lines. 

The present invention also provides a mask generator for an audio encoder, said mask 
25 generator adapted to generate linear masking components from input audio data, 
logarithmic masking components from said linear masking components; and a global 
masking threshold from the logarithmic masking components. 
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The present invention also provides a psychoacoustic masking process for use in an audio 
encoder, including: 

generating energy values from Fourier transformed audio data; 
determining sound pressure level values from said energy values; 

5 selecting tonal and non-tonal masking components on the basis of said energy values; 
generating power values from said energy values; 

generating masking thresholds on the basis of said masking components and said 
power values; and 

generating signal to mask ratios for a quantizer on the basis of said sound pressure 
10 level values and said masking thresholds. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments of the present invention are hereinafter described, by way of 
example only, with reference to the accompanying drawings, wherein: 

IS Figure 1 is a block diagram of a preferred embodiment of an audio encoder; 

Figure 2 is a flow diagram of a prior art process for generating masking data; 

Figure 3 is a flow diagram of a mask generation process executed by a mask 
generator of the audio encoder. 

20 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

As shown in Figure 1, an audio encoder 100 includes a mask generator 102, a filter bank 
104, a quantizer 106, and a bit stream generator 108. The audio encoder 100 executes an 
audio encoding process that generates encoded audio data 1 12 from input audio data 110. 
The encoded audio data 112 constitutes a compressed representation of the input audio 
25 data 110. 



The audio encoding process executed by the encoder 100 performs encoding steps based 
on MPEG-1-L2 processes described in the MPEG-1 standard. The time-domain input 
audio data 110 is converted into sub-bands by the filter bank 104, and the resulting 
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frequency-domain data is then quantized by the quantizer 106. The bitstream generator 
108 then generates encoded audio data or bitstream 112 from the quantized data. The 
quantizer 1 06 performs bit allocation and quantization based upon masking data generated 
by the mask generator 102. The masking data is generated from the input audio data 110 
S on the basis of a psychoacoustic model of human hearing and aural perception. The 
psychoacoustic modelling takes into account the frequency-dependent thresholds of human 
hearing, and a psychoacoustic phenomenon referred to as masking, whereby a strong 
frequency component close to one or more weaker frequency components tends to mask, 
the weaker components, rendering them inaudible to a human listener. This makes it 
1 0 possible to omit the weaker frequency components when encoding audio data, and thereby 
achieve a higher degree of compression, without adversely affecting the perceived quality 
of the encoded audio data 112. The masking data comprises a signal-to-mask ratio value 
for each frequency sub-band. These signal-to-mask ratio values represent the amount of 
signal masked by the human ear in each frequency sub-band. The quantizer 106 uses this 
IS information to decide how best to use the available number of data bits to represent the 
input audio signal 1 1 0. 

In known or prior art MPEG-1-L2 encoders, the generation of masking data has been 
found to be the most computationally intensive component of the encoding process, 
20 representing up to S0% of the total processing resources. The MPEG-1 standard provides 
two example implementations of the psychoacoustic model: psychoacoustic model 1 
(PAM1) is less complex and makes more compromises on quality than psychoacoustic 
model 2 (PAM2). PAM2 has better performance for lower bit rates. Nonetheless, quality 
tests indicate that PAM1 can achieve good quality encoding at high bit rates such as 2S6 
25 and 384 kbps. However, PAM1 is implemented in floating point arithmetic and is not 
optimized for chip-based encoders. As described in G.A. Davidson et. al., Parametric Bit 
Allocation in a Perceptual Audio Coder , 97th Convention of Audio Engineering Society, 
November 1994, it has been estimated that PAM1 demands more than 30 MIPS of 
computing power per channel. 



30 
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Moreover, despite using the C double precision type throughout, the ISO implementation 
uses an extremely large number of arithmetic operations, each resulting in a loss of 
precision at each step of the psychoacoustic masking data generation process. 

5 The psychoacoustic mask generation process 300 executed by the mask generator 102 
provides an implementation of the psychoacoustic model that maintains quality whilst 
significantly reducing the computational requirements. 

In order to most clearly describe the advantages of the psychoacoustic mask generation 
10 process 300, the steps of the process are described below with reference to a prior art 
process 200 for generating psychoacoustic masking data, as described in the MPEG-1 
standard. 

In the described embodiment, the audio encoder is a standard digital signal processor 
15 (DSP) such as a TMS320 series DSP manufactured by Texas Instruments. The audio 
encoding modules 102 to 108 of the encoder 100 are software modules stored in the 
firmware of the DSP-core. However, it will be apparent that at least part of the audio 
encoding modules 102 to 108 could alternatively be implemented as dedicated hardware 
components such as application-specific integrated circuits (ASICs). 

20 

As shown in Figures 2 and 3, both the psychoacoustic mask generation process 300 and the 
prior art process 200 for generating masking data begin by Hann windowing the 512- 
sample time-domain input audio data frame 110 at step 204. The Hann windowing 
effectively centers the 512 samples between the previous samples and the subsequent 
25 samples, using a Hann window to provide a smooth taper. This reduces ringing edge 
artefacts that would otherwise be produced at step 206 when the time-domain audio data 
1 10 is converted to the frequency domain using a 1024-point fast Fourier transform (FFT). 
At step 208, an array of 512 energy values for respective frequency sub-bands is then 
generated from the symmetric array of 1024 FFT output values, according to: 

30 E(n) = \X(n)\ 2 = X]i(n)+X}(n) , 
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where X(n) = X R (n)+iX / («) is the FFT output of the nth spectral line. 

In this specification, a value or entity is described as logarithmic or as being in the 
logarithmic-domain if it has been generated as the result of evaluating a logarithmic 
5 function. When a logarithmic value or entity is exponentiated by the reverse operation, it is 
described as linear or as being in the linear-domain. 

In the prior art process 200, the linear energy values E(n) are then converted into 
logarithmic power spectral density (PSD) values P(n) at step 210, according 
10 to P(n) = 101og 10 E(n), and the linear energy values E(n) are not used again. The PSD 
values are normalised to 96 dB at step 212. 

Steps 210 and 212 are omitted from the mask generation process 300. 

IS The next step in both processes is to generate sound pressure level (SPL) values for each 
sub-band. In the prior art process, an SPL value L sb (n) is generated for each sub-band n at 
step 214, according to: 

L sb (*) = MAX \X spl (4 20 * log^c/^ («) * 32768) - lOjdB 

and 

20 jr v/ (n) = 10*log 10 ^ 10 " r( * )/, °) dB 

where scf max (n) is the maximum of the three scale factors of sub-band n within an 
MPEG1 L2 audio frame comprising 1152 stereo samples, X(k) is the PSD value of index 
k, and the summation over k is limited to values of k within sub-band n. The "-10 dB" term 
25 corrects for the difference between peak and RMS levels. 

Significantly, the prior art generation of SPL values involves evaluating many exponentials 
and logarithms in order to convert logarithmic power values to linear energy values, sum 
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them, and then convert the summed linear energy values back to logarithmic power values. 
Each conversion between the logarithmic and linear domains is computationally expensive 
and degrades the precision of the result. 

5 In the mask generation process 300, L, b {ri) is generated at step 302 using the same first 
formula for L sb (n) , but with: 

X+ (») ■ = 10* log l0 (X*(A)) + 96 dB 

where X(k) is the linear energy value of index k. The "96 dB" term is used to 
10 normalise L sb (n) . It will be apparent that this improves upon the prior art by avoiding 
exponentiation. Moreover, the efficiency of generating the SPL values is significantly 
improved by approximating the logarithm by a second order Taylor expansion. 

Specifically, representing the argument of the logarithm as Ipt, this is first normalised by 
1 5 determining x such that: 

Ipt = (\ - x)2 m , 0.5<l-x<l 

Using a second order Taylor expansion, 

ln(l-x)»-x-x 2 /2 

20 the logarithm can be approximated as: 

logioCTpO » [m * ln(2) - (x + x 2 /2)] * logi 0 (e) 

= [m * ln(2) - (x + x * x * 0.5)] * log 10 (e) 

25 Thus the logarithm is approximated by four multiplications and two additions, providing a 
significant improvement in computational efficiency. 
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The next step is to identify frequency components for masking. Because the tonality of a 
masking component affects the masking threshold, tonal and non-tonal (noise) masking 
components are determined separately. 



5 First, local maxima are identified. A spectral line X{k ) is deemed to be a local maximum 
if 

X(k)>X{k - 1) and X(k)* X(k + 1) 

In the prior art process 200, a local maximum X{k) thus identified is selected as a 
10 logarithmic tonal masking component at step 216 if: 

X(k)-X(k + j)* 7dB 

where j is a searching range that varies with k. If X(k) is found to be a tonal component, 
then its value is replaced by: 

15 -^ r ( w(^) = 101og, 0 (lO Ar ^*' I ^ 1# +io*^ /, ° + io Jf ^ +, ^ /, °) 

All spectral lines within the examined frequency range are then set to - oo dB. 

In the mask generation process 300, a local maximum X(k) is selected as a linear tonal 
20 masking component at step 304 if: 

*(*)*! 0~°- 7 7>X(k + j) 

If X(k) is found to be a tonal component, then its value is replaced by: 

X^, (k) = X(k - 1) + X(k)+ X(k + 1) 

25 All spectral lines within the examined frequency range are then set to 0. 



The next step in either process is to identify and determine the intensity of non-tonal 
masking components within the bandwidth of critical sub-bands. For a given frequency. 
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the smallest band of frequencies around that frequency which activate the same part of the 
basilar membrane of the human ear is referred to as a critical band. The critical bandwidth 
represents the ear's resolving power for simultaneous tones. The bandwidth of a sub-band 
varies with the center frequency of the specific critical band. As described in the MPEG-1 
5 standard, 26 critical bands are used for a 48 kHz sampling rate. The non-tonal (noise) 
components are identified from the spectral lines remaining after the tonal components are 
removed as described above. 

At step 218 of the prior art process 200, the logarithmic powers of the remaining spectral 
10 lines within each critical band are converted to linear energy values, summed and then 
converted back into a logarithmic power value to provide the SPL of the new non-tonal 
component X noise {k) corresponding to that critical band. The number k is the index 
number of the spectral line nearest to the geometric mean of the critical band. 

15 In the mask generation process 300, the energy of the remaining spectral lines within each 
critical band are summed at step 306 to provide the new non-tonal component XnoisA*) 
corresponding to that critical band: 

X nou .{k) = Y, X i k ) 

* 

for k in sub-band n. Only addition is used, and no exponential or logarithmic evaluations 
20 are required, providing a significant improvement in efficiency. 

The next step is to decimate the tonal and non-tonal masking components. Decimation is a 
procedure that is used to reduce the number of masking components that are used to 
generate the global masking threshold. 

25 

In the prior art process 200, logarithmic tonal components X tona i(k) and non-tonal 
components X noise (k) are selected at step 220 for subsequent use in generating the 
masking threshold only if: 

X lmal (k>LT q (k) or X„ oue (k)*LT q (k) 
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respectively, where LT q (k) is the absolute threshold (or threshold in quiet) at the 

frequency of index k; threshold in quiet values in the logarithmic domain are provided in 
the MPEG-1 standard. 

5 Decimation is performed on two or more tonal components that are within a distance of 
less than 0.5 Bark, where the Bark scale is a frequency scale on which the frequency 
resolution of the ear is approximately constant, as described in E. Zwicker, Subdivision of 
the Audible Frequency Range into Critical Bands, J. Acoustical Society of America, vol. 
33, p. 248, February 1961. The tonal component with the highest power is kept while the 
10 smaller component(s) are removed from the list of selected tonal components. For this 
operation, a sliding window in the critical band domain is used with a width of 0.5 Bark. 

In the mask generation process 300, linear components are selected at step 308 only if: 

15 X, ma ,(k)*LT q E{k) or X mm (k) ^ LT q E(k) 



where LT q E(k ) are taken from a linear-domain absolute threshold table pre-generated 
from the logarithmic domain absolute threshold table LT q (k ) according to: 

LT q E(k) = 1 o 10 *^^ 96 !' 10 

where the "-96" term represents denormalization. 



After denormalization, the spectral data in the linear energy domain are converted into the 
logarithmic power domain at step 310. In contrast to step 206 of the prior art process, the 
25 evaluation of logarithms is performed using the efficient second-order approximation 
method described above. This conversion is followed by normalization to the reference 
level of 96 dB at step 212. 



Having selected and decimated masking components, the next step is to generate 
30 individual masking thresholds. Of the original 512 spectral data values, indexed by fc, only 
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a subset, indexed by i, is subsequently used to generate the global masking threshold, and 
this step determines that subset by subsampling, as described in the MPEG-1 standard. 

The number of lines n in the subsampled frequency domain depends on the sampling rate. 
5 For a sampling rate of 48 kHz, n - 126. Every tonal and non-tonal component is assigned 
an index / that most closely corresponds to the frequency of the corresponding spectral line 
in the original (/. e . , before sub-sampling) spectral data. 

The individual masking thresholds of both tonal and non-tonal components, LT tonal and 
10 LT noise , are then given by the following expressions: 

LT,onal \?(j\ 7 (01 = ^ tonal [z(j)]+av 

tonal [z(j)]+vf[z(j\z(i)] dB 

LT nolse [z{j\ z(i)] = X noue [z(j)]+ ov„oae [*(/)]+ vf[z(j), z(/)] dB 

15 where i is the index corresponding to a spectral line, at which the masking threshold is 
generated and j is that of a masking component; z(/‘) is the Bark scale value of the i ,|h 
spectral line while z(j) is that of the j line; and terms of the form X[z(fjj are the SPLs 
of the (tonal or non-tonal) masking component. The term av, referred to as the masking 
index, is given by: 

20 

Wma! =-1.525 -0.275* z(j)- 4.5 dB 
av BOUe = -1.525- 0.175 *z(/)-0.5 dB 

v/is a masking function of the masking component and is characterized by different lower 
25 and upper slopes, depending on the distance in Bark scale dz, dz = z(/)- z(j ) 

In the prior art process 200, individual masking thresholds are generated at step 222 using 
a masking function vf given by: 



30 



vf = 17* (dz + 1) — 0.4 * X[z{j)"\ — 6 dB, for-3£dz<-l Bark 
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vf = {0.4 * X[z(j)] + 6}*dz dB, for - 1 £ dz < 0 Bark 

vf = —1 7 * dz dB, for 0 ^ dz < 1 Bark 

5 v/ = -17 * dz + 0.15 * X[z(j)] *(dz-l) dB, for 1 ^ dz < 8 Bark 

where X\z(j)\ is the SPL of the masking component with index j. No masking threshold is 
generated if dz < -3 Bark, or dz > 8 Bark. 

1 0 The evaluation of the masking function v/is the most computationally intensive part of this 
step of the prior art process. The masking function can be categorized into two types: 
downward masking (when dz < 0) and upward masking (when dzt 0 ). As described in 
Davis Pan, A Tutorial on MPEG/Audio Compression, IEEE Journal on Multimedia, 1995, 
downward masking is considerably less significant than upward masking. Consequently, 
15 only upward masking is used in the mask generation process 300. Moreover, further 

analysis shows that the second term in the masking function for 1 < dz < 8 Bark is 

typically approximately one tenth of the first term, -11* dz. Consequently, the second term 
can be safely discarded. 

20 Accordingly, the mask generation process 300 generates individual masking thresholds at 
step 312 using a single expression for the masking function vf, as follows: 

vf - -17 * dz, 0^dz<8 

This greatly reduces the computational load while maintaining good quality encoding. The 
masking index av is not modified from that used in the prior art process, because it makes 
25 a significant contribution to the individual masking threshold LT and is not 

computationally demanding. 

After the individual masking thresholds have been generated, a global masking threshold is 
generated. 

30 
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In the prior art process 200, the global masking threshold LT g {i) at the i* frequency 

sample is generated at step 224 by summing the powers corresponding to the individual 
masking thresholds and the threshold in quiet, according to: 



5 



Xr g (z) = 101og 10 



jq£7-,(<)/10 



+ /1Q + 'J' I Q^r^trox^p] /IQ 

7-1 



where m is the total number of tonal masking components, and n is the total number of 
non-tonal masking components. The threshold in quiet LT q is offset by -12 dB for bit 
rates £ 96 kbps per channel. 

10 



It will be apparent that this step is computationally demanding due to the number of 
exponentials and logarithms that are evaluated. 



In the mask generation process 300, these evaluations are avoided and smaller terms are 
15 not used. The global masking threshold LT g (i) at the I th frequency sample is generated at 

step 314 by comparing the powers corresponding to the individual masking thresholds and 
the threshold in quiet, as follows: 

LT g (/) = max[LT q (/)+ max?,,, {LT tonal [z(/V(0]}+ max "=l { LT noise [ z Ul *0')]}] 

20 

The largest tonal masking components and of non-tonal masking components are 
identified. They are then compared with LT q {i). The maximum of these three values is 

selected as the global masking threshold at the z -th frequency sample. This reduces 
computational demands at the expense of occasional over allocation. As above, the 
25 threshold in quiet LT q is offset by -12dB for bit rates 96 kbps per channel. 



Finally, signal-to-mask ratio values are generated at step 226 of both processes. First, the 
minimum masking level LT min (n) in sub-band n is determined by the following 



expression: 
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LT mn (n) = Min[LT g (i)J dB; for /(i) in subband n, 

where /(i) is the / ,th frequency line within sub-band n. A minimum masking threshold 
5 LT min (n) is determined for every sub-band. The signal-to-mask ratio for every sub-band n 
is then generated by subtracting the minimum masking threshold of that sub-band from the 
corresponding SPL value: 

SMR sb (n)= L sb (w) LT m[n in) 

10 

The mask generator 102 sends the signal-to-mask ratio data SMR, b (n) for each sub-band n 
to the quantizer 104, which uses it to determine how to most effectively allocate the 
available data bits and quantize the spectral data, as described in the MPEG-1 standard. 

1 5 Many modifications will be apparent to those skilled in the art without departing from the 
scope of the present invention as herein described with reference to the accompanying 
drawings. 




- 15- 



CLAIMS: 

1 . A mask generation process for use in encoding audio data, including: 
generating linear masking components from said audio data; 

5 generating logarithmic masking components from said linear masking 

components; and 

generating a global masking threshold from the logarithmic masking 
components. 

10 2. A mask generation process as claimed in claim 1 , wherein said step of generating linear 

masking components includes: 

generating linear components in a frequency domain from said audio data; 
selecting a first subset of said linear components as linear tonal components; 
and 

1 5 selecting a second subset of said linear components as linear non-tonal 

components. 

3. A mask generation process as claimed in claim 2, including generating sound pressure 
levels from said linear components using a second-order Taylor expansion of a 

20 logarithmic function. 

4. A mask generation process as claimed in claim 1, wherein said logarithmic masking 
components are generated using a second-order Taylor expansion of a logarithmic 
function. 

25 

5. A mask generation process as claimed in claim 3 or 4, including generating a 
normalised value corresponding to an argument of said logarithmic function, and using 
said normalised value in said Taylor expansion. 

30 6. A mask generation process as claimed in claim 5, including: 

generating said normalised value x for said argument Ipt, according to: 
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Ipt = (\-x)2 m , 0.5<l-x£l 
and using a second order Taylor expansion of the form 

ln(l - x) « — x-x 2 / 2 

to approximate said logarithmic function as: 

5 log 10 (/pO » [ffi * ln(2) -(x + x 2 / 2)] * log 10 (e) 

7. A mask generation process as claimed in claim 2, wherein said step of generating a 
global masking threshold includes: 

decimating said linear tonal components and said linear non-tonal components; 

10 and 

generating masking thresholds from the decimated linear tonal components and 
the decimated linear non-tonal components. 

8. A mask generation process as claimed in claim 1 , including generating masking 

1 S thresholds from said logarithmic masking components using a masking function of the 

form: 

vf = -17 * dz, 0£dz<S 

9. A mask generation process as claimed in claim 7, wherein said step of generating a 

20 global masking threshold includes determining maximum components of said masking 
thresholds and predetermined threshold values. 

10. A mask generation process as claimed in claim 9, wherein said global masking 
threshold is generated according to: 

25 LT g (i) = max[z,r 9 (/)+ max}., {LT tonal [z(j\ *(»')]}+ max}., {LT noise [z{j\ z(/)]}] 

where i and j are indices of logarithmic power components, z(/) is a Bark scale value 
for logarithmic power component i, LT lonal [z(/)>z(i)] is a tonal masking threshold for 
logarithmic power components i and j, LT n0M [z(/)> z (i )] is a non-tonal masking 
threshold for logarithmic power components i and j, m is the number of tonal 
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logarithmic power components, and n is the number of non-tonal logarithmic power 
components. 

1 1. A mask generation process for use in encoding audio data, including: 

5 generating respective masking thresholds from logarithmic masking components 

using a masking function of the form: 
vf = -17 * dz, 0^tfe<8 

12. A mask generation process for use in encoding audio data, including: 

10 generating a global masking threshold from logarithmic masking components 

according to: 

LT g (/) = max[z,r 9 (i)+ mucj =l {LT tonal [z(j\ z(i)]}+ max^, {LT nolse [a(/>z(f)]}] 
where i and j are indices of spectral audio data, z(i) is a Bark scale value for 
spectral line i, LT lonal [z(/^z(/)] is a tonal masking threshold for lines i and j, 

15 LT mm [z(jh (/)] is a non-tonal masking threshold for lines i and j, m is the number 

of tonal spectral lines, and n is the number of non-tonal spectral lines. 

13. A mask generation process as claimed in any one of claims 1 to 13, wherein said linear 
masking components include linear energy components, and said logarithmic masking 

20 components include logarithmic power components. 

14. A mask generation process as claimed in any one of claims 1 to 12, wherein said 
process is an MPEG-1 layer 2 audio encoding process. 

25 1 5. A mask generator having components for executing the steps of any one of claims 1 to 

14. 

16. An audio encoder having components for executing the steps of any one of claims 1 to 
14. 

30 
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17. A computer readable storage medium having stored thereon program code for 
executing the steps of any one of claims 1 to 14. 

18. A mask generator for an audio encoder, said mask generator adapted to generate linear 
5 masking components from input audio data, logarithmic masking components from 

said linear masking components; and a global masking threshold from the logarithmic 
masking components. 

19. A psychoacoustic masking process for use in an audio encoder, including: 

1 0 generating energy values from Fourier transformed audio data; 

determining sound pressure level values from said energy values; 

selecting tonal and non-tonal masking components on the basis of said energy values; 

generating power values from said energy values; 

generating masking thresholds on the basis of said masking components and said 
15 power values; and 

generating signal to mask ratios for a quantizier on the basis of said sound pressure 
level values and said masking thresholds. 

# 

20. An MPEG-1-L2 encoder adapted to execute the masking process of claim 19. 



20 
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ABSTRACT 

5 DEVICE AND PROCESS FOR USE IN ENCODING AUDIO DATA 

A mask generation process for use in encoding audio data, including generating linear 
masking components from the audio data, generating logarithmic masking components 
from the linear masking components, and generating a global masking threshold from the 
10 logarithmic masking components. The process is a psychoacoustic masking process for 
use in an MPEG-1-L2 encoder, and includes generating energy values from a Fourier 
transform of the audio data, determining sound pressure level values from the energy 
values, selecting tonal and non-tonal masking components on the basis of the energy 
values, generating power values from the energy values, generating masking thresholds on 
1 5 the basis of the masking components and the power values, and generating signal to mask 
ratios for a quantizier on the basis of the sound pressure level values and the masking 
thresholds. 
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Figure 1 












Figure 2 






Figure 3 





